Maximum Entropy Density Estimation with Incomplete Data
نویسندگان
چکیده
We propose a natural generalization of Regularized Maximum Entropy Density Estimation (maxent) to handle input data with unknown values. While standard approaches to handling missing data usually involve estimating the actual unknown values, then using the estimated, complete data as input, our method avoids the two-step process and handles unknown values directly in the maximum entropy formulation. The maxent method was recently proposed as an excellent method of presence-only prediction [2, 3]. In a presence-only framework, we are given a set, X , of data in which some of the data are labeled as positive. However, unlike the typical classification framework, the remaining unlabeled instances are not necessarily negative. Instead, they are considered of unknown class. The regularized maxent method treats the positively labeled points as random draws from some hidden distribution overX and attempts to estimate that distribution. Specifically, regularized maxent tries to find a distribution over X with maximum entropy such that the expected values of each feature are close to the observed means of the features with a positive label. Let F be anN×D matrix of features such that Fij is the i’th datum’s j’th feature. Let vectorm be the means of the D features of the labeled positive data. Then the standard regularized maxent optimization is:
منابع مشابه
Maximum Entropy Density Estimation with Incomplete Presence-Only Data
We demonstrate a generalization of Maximum Entropy Density Estimation that elegantly handles incomplete presence-only data. We provide a formulation that is able to learn from known values of incomplete data without having to learn imputed values, which may be inaccurate. This saves the effort needed to perform accurate imputation while observing the principle of maximum entropy throughout the ...
متن کاملConsistency and Generalization Bounds for Maximum Entropy Density Estimation
We investigate the statistical properties of maximum entropy density estimation, both for the complete data case and the incomplete data case. We show that under certain assumptions, the generalization error can be bounded in terms of the complexity of the underlying feature functions. This allows us to establish the universal consistency of maximum entropy density estimation.
متن کاملModeling of the Maximum Entropy Problem as an Optimal Control Problem and its Application to Pdf Estimation of Electricity Price
In this paper, the continuous optimal control theory is used to model and solve the maximum entropy problem for a continuous random variable. The maximum entropy principle provides a method to obtain least-biased probability density function (Pdf) estimation. In this paper, to find a closed form solution for the maximum entropy problem with any number of moment constraints, the entropy is consi...
متن کاملMaximum Entropy Formalism and Genetic Algorithms
The maximum entropy principle [1 – 3] is a powerful tool in the investigations of image reconstruction, spectral analysis, seismic inversion, inverse scattering etc. It is proven to be the only consistent method for inferring from incomplete information. Here we show that the maximum entropy principle can be cast into a unconstrained optimization problem and therefore genetic algorithms [4, 5] ...
متن کاملQuasi-continuous maximum entropy distribution approximation with kernel density
This paper extends maximum entropy estimation of discrete probability distributions to the continuous case. This transition leads to a nonparametric estimation of a probability density function, preserving the maximum entropy principle. Furthermore, the derived density estimate provides a minimum mean integrated square error. In a second step it is shown, how boundary conditions can be included...
متن کامل